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Causal  networks  with  selectively  influenced  components 

Report  on  the  part  under  Richard  Schweickert’ s  supervision 


Considerable  evidence  indicates  that  the  mental  processes  involved  in  performing  some 
tasks,  such  as  recall  of  items  from  a  list,  are  arranged  in  a  processing  tree  (Batchelder  &  Riefer, 
1999).  In  such  a  tree,  each  process  is  represented  by  a  vertex  and  each  possible  outcome  of  a 
process  is  represented  by  an  arc  descending  from  the  vertex  representing  it.  Many  causal 
networks  can  be  represented  as  processing  trees  (Schweickert  &  Chen,  July,  2008).  Responses 
are  represented  by  tenninal  vertices  that  have  no  descendents.  Processing  trees  are  usually  used 
to  model  the  probabilities  of  the  various  possible  responses.  Typically  an  investigator  proposes 
such  a  tree,  estimates  parameters,  and  tests  it  through  goodness  of  fit.  A  portion  of  the  work  on 
the  grant  is  on  developing  further  another  approach,  Tree  Inference  (Schweickert  &  Chen,  2008). 
With  Tree  Inference,  a  processing  tree  is  not  proposed  ahead  of  time.  Instead,  the  investigator 
manipulates  experimental  factors,  such  as  the  number  of  items  to  be  recalled  and  the  delay 
between  presentation  and  recall.  A  factor  is  said  to  selectively  influence  a  vertex  if  it  changes 
parameters  associated  with  the  descendents  of  that  vertex  and  no  other.  If  a  factor  selectively 
influences  a  vertex  we  also  say  it  selectively  influences  the  process  represented  by  that  vertex. 

In  an  experiment  with  two  factors,  the  investigator  can  test  whether  each  factor  selectively 
influences  a  different  vertex.  If  so,  the  form  of  a  processing  tree  accounting  for  the  data  can 
detennined. 

Prior  to  the  work  on  the  grant,  processing  trees  were  not  used  for  modeling  reaction  time, 
and  there  were  three  limitations  to  Tree  Inference.  1)  It  was  applicable  to  experiments  with  only 
two  possible  responses  (e.g.,  correct  or  wrong).  2)  Parameters  associated  with  the  arcs  of  a 
processing  tree  were  probabilities  bounded  above  by  1 .  3)  An  experimental  factor  was  required 
to  have  an  effect  at  only  one  vertex.  These  three  limitations  have  now  been  overcome 
(Schweickert  &  Xi,  2011).  Now,  for  example,  rates  of  responding  can  be  analyzed  in  addition  to 
probabilities  of  responses. 

A  pair  of  vertices  can  be  related  in  one  of  two  ways  in  a  processing  tree.  There  may  be  a 
path  from  the  root  to  a  tenninal  vertex  that  passes  through  both  vertices.  In  that  case  the  vertices 
are  said  to  be  ordered.  If  there  is  no  such  path,  the  vertices  are  said  to  be  unordered.  Qualitative 
tests  have  been  developed,  through  work  on  the  grant,  that  allow  an  investigator  to  test  whether 
two  experimental  factors  selectively  influence  two  ordered  vertices,  and  if  so,  detennine  their 
order.  Processing  trees  were  found  to  account  well  for  data  in  the  literature  on  immediate 
ordered  recall  and  on  effects  of  sleep  and  retroactive  interference  (Schweickert,  Fisher  &  Sung, 
in  press). 

Processing  trees  were  originally  developed  for  analyzing  response  probabilities,  not 
response  times.  Through  recent  work  on  the  grant,  processing  trees  can  now  be  inferred  from  a 
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joint  analysis  of  response  time  and  accuracy.  New  theorems  provide  necessary  and  sufficient 
conditions  for  reaction  time  data  to  be  generated  from  an  experiment  in  which  two  factors 
selectively  influence  two  different  processes. 

Although  processing  trees  describe  well  the  organization  of  mental  processes  for  some 
tasks,  there  is  no  reason  to  expect  processes  to  be  organized  in  the  same  way  for  all  tasks.  In 
some  tasks,  there  is  evidence  that  the  processes  are  organized  in  a  directed  acyclic  network  (a 
critical  path  network).  In  earlier  work  (Schweickert,  1978),  a  method  was  developed  for 
analyzing  reaction  times  to  test  whether  two  experimental  factors  selectively  influence  two 
different  processes  in  a  directed  acyclic  network.  If  the  test  was  passed,  part  of  the  network 
could  be  inferred  from  the  data.  In  particular,  an  investigator  could  determine  whether  the  two 
selectively  influenced  processes  are  sequential  (ordered  in  the  network)  or  concurrent 
(unordered). 

Ordinarily  for  a  given  data  set,  if  one  directed  acyclic  network  can  account  for  the  data, 
then  several  different  networks  can  account  for  the  data  as  well.  There  are  two  ways  to 
detennine  the  form  of  a  directed  acyclic  network  that  accounts  for  the  reaction  time  data  when 
factors  selectively  influence  processes  in  the  network.  One  way  is  quantitative,  through  analysis 
of  the  slacks  in  the  network  (Schweickert,  1978).  Another  way  is  qualitative,  through  analysis  of 
which  pairs  of  processes  are  sequential  and  which  are  concurrent,  using  the  Transitive 
Orientation  Algorithm  (e.g.,  Golumbic,  1980).  Each  method  generates  a  set  of  possible 
networks  that  can  account  for  the  data,  so  the  question  arises  of  whether  the  set  of  possible 
networks  generated  by  one  method  is  more  restricted  than  the  set  generated  by  the  other.  Work 
on  the  grant  that  shows  that  these  sets  are  the  same  when  a  serial-parallel  network  accounts  for 
the  data.  In  other  words,  the  uniqueness  of  directed  acyclic  networks  inferred  from  the  effects  on 
reaction  time  of  factors  selectively  influencing  processes  has  now  been  characterized  for  serial- 
parallel  networks.  A  case  remains  to  be  characterized,  that  of  networks  containing  a  subnetwork 
in  the  form  of  a  Wheatstone  bridge. 

One  of  the  products  of  the  grant  is  a  book,  planned  for  appearance  in  early  2012,  on 
inferring  cognitive  architecture  by  selectively  influencing  mental  processes. 

A  manuscript  related  to  the  grant  is  now  under  review  (Schweickert,  Fortin,  Xi  &  Viau- 
Quesnel).  Because  the  data  are  not  publically  available,  they  are  summarized  in  the  Appendix. 
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Appendix 

Manuscript  of  Schweickert,  Fortin,  Xi  &  Viau-Quesnel  (Submitted):  Method  and  Results  in  Brief 

Method.  In  the  experiment  summarized  here,  the  participant  began  by  memorizing  two 
memory  sets,  a  set  of  words  and  a  set  of  consonants.  One  memory  set  was  presented  again  at  the 
start  of  each  trial.  Items  in  this  memory  set  were  said  to  be  from  the  active  pool. 

In  the  Reaction  Time  Condition,  on  each  trial,  a  probe  was  presented  and  the  participant's 
task  was  to  press  a  button  to  indicate  whether  the  probe  was  present  in  one  of  the  memory  sets  or 
not.  Two  factors  were  varied  from  trial  to  trial:  the  presence  or  absence  of  the  probe  in  the 
memory  set  and  whether  the  probe  was  from  the  active  memory  pool  or  the  inactive  memory 
pool.  A  third  factor  was  varied  between  blocks  of  trials,  whether  the  active  and  inactive  memory 
set  contained  three  and  six  items,  respectively,  or  contained  six  and  three  items,  respectively. 
Reaction  time  and  accuracy  were  measured. 

In  the  Time  Production  Condition,  the  procedure  was  the  same,  except  as  follows.  Prior 
to  the  memory  search  trials,  participants  were  trained  to  produce  a  time  interval  of  2.4  seconds. 
On  the  later  memory  search  trials,  the  participant  was  instructed  to  respond  when  he  or  she 
judged  that  a  2.4  second  interval  had  elapsed  since  presentation  of  the  probe. 

Results.  There  were  two  main  results.  First,  in  the  Reaction  Time  Condition,  reaction 
times  increased  when  the  inactive  memory  set  was  searched  and  increased  when  the  set  size  was 
larger.  The  combined  effect  of  these  two  factors  was  additive.  The  probability  of  a  correct 
response  decreased  when  the  inactive  memory  set  was  searched  and  decreased  when  the  set  size 
was  larger.  The  combined  effect  of  these  two  factors  was  multiplicative.  A  simple  processing 
tree  accounts  for  the  data  well,  in  which  activation  of  the  memory  set  is  followed  by  searching 
the  memory  set.  The  durations  of  the  processes  add  and  the  probabilities  that  the  processes  are 
correct  multiply. 

The  second  main  result  is  in  the  Time  Production  Condition.  Time  intervals  produced  by 
the  participants  were  longer  when  the  size  of  the  memory  set  to  be  searched  was  larger.  The 
memory  search  interfered  with  timing.  However,  the  time  intervals  produced  were  not  longer 
when  the  memory  set  was  inactive.  Activating  the  memory  set  did  not  interfere  with  timing. 
Timing  and  activating  a  memory  set  do  not  compete  for  capacity,  but  timing  and  search  do. 


5 


Table  from  Schweickert,  Fortin,  Xi  &  Viau-Quesnel  (Submitted) 


Mean  Reaction  Times,  Time  Productions  and  Percent  Errors 


Reaction  Time  Condition 

Memory  Set 

Probe 

Active 

Size  3 

Active  Inactive 

Size  6  Size  3 

Inactive 

Size  6 

Present 

830  (3.5) 

933  (6.2)  921  (10.7) 

1003  (12.6) 

Absent 

880  (1.1) 

969  (2.8)  928  (3.7) 

1016  (7.0) 

Time  Production  Condition 

Memory  Set 

Probe 

Active 

Size  3 

Active  Inactive 

Size  6  Size  3 

Inactive 

Size  6 

Present 

3419(3.6) 

3530(8.0)  3530(13.9) 

3463  (13.7) 

Absent 

3418(1.8) 

3548  (4.4)  3530  (7.0) 

3469(10.9) 

Note:  Times  in  msec,  percent  errors  in  parentheses. 
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Causal  networks  with  selectively  influenced  components 

Report  on  the  part  under  Ehtibar  Dzhafarov’s  supervision 


1  Introduction 

1.1  The  problem  and  its  applications 

This  part  of  the  project  was  aimed  at  further  developing  the  theory  of  selective  probabilistic  causal¬ 
ity.  The  theory  answers  the  question:  Given  a  set  of  inputs  into  a  system  (e.g.,  independent  variables 
characterizing  stimuli  in  a  psychological  experiment)  and  a  set  of  stochastically  non-independent 
random  outputs  (e.g.,  random  variables  describing  different  aspects  of  human  responses),  how  can 
one  determine,  for  each  of  the  outputs,  which  of  the  inputs  it  is  influenced  by? 

The  theory  has  applications  in  behavioral  and  social  sciences,  including  such  problems  as:  in  the 
investigations  of  networks  of  mental  operations,  does  a  certain  experimental  manipulation  selectively 
influence  only  a  certain  component  of  the  network?  in  conjoint  testing,  does  study  time  or  specific 
training  for  one  of  the  tests  selectively  influence  one’s  performance  in  this  test  only?  in  studying 
perceptual  judgments,  is  an  assessment  of  a  given  stimulus  property  selectively  influenced  by  this 
property  alone?  in  medical  research,  does  the  presence  or  absence  of  a  given  symptom  selectively 
depend  on  a  given  illness? 

The  theory  also  has  applications  in  quantum  mechanics,  in  answering  such  questions  as:  can 
a  model  with  local  non-contextual  variables  account  for  the  distribution  of  spins  in  a  system  of 
entangled  particles?  The  non-commuting  measurements  of  spins  along  different  axes  performed  on 
a  given  particle  correspond  to  the  mutually  exclusive  values  of  an  experimental  manipulation  in 
behavioral  applications. 

Other  applications  of  the  theory  can  be  deduced  from  the  fact  that  it  generalizes  all  conceivable 
combinations  of  nonlinear  factor  and  regression  analyses,  with  no  constraints  imposed  on  the  rela¬ 
tionship  between  explanatory  and  response  variables,  or  on  the  unobservable  sources  of  randomness. 

This  part  of  the  project  involved  as  a  senior  personnel  Janne  Kujala  of  University  of  Jyvaskyla, 
Finland. 

1.2  Notation  and  basic  definitions 

The  problem  can  be  illustrated  on  the  following  diagram  of  selective  influences,  shown  in  two 
equivalent  forms: 
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A1,  A2,  and  A 3  here  are  random  outputs,  a,/3,-y,  and  S  are  inputs  (also  referred  to  as  external 
factors),  and  arrows  indicate  the  relation  “influences.”  The  right-hand  diagram  is  in  the  canonical 
form,  with  the  factors  redefined  so  that  each  random  output  A1  is  influenced  by  a  single  factor, 
a1 .  Factors  are  treated  as  deterministic  quantities,  i.e. ,  even  if  they  are  random  variables,  the  joint 
distribution  of  the  outputs  is  always  conditioned  on  their  specific  values  (or  levels).  Each  factor 
can  be  on  one  of  several  levels,  and  the  joint  distribution  of  (A1,  ^ 42,  A3)  is  supposed  to  be  known 
for  each  allowable  combination  of  factor  levels  ( treatment ).  Thus,  if  factors  a,/3,j, 6  are  all  binary, 
then  each  of  the  2x2x2x2  =  16  logically  possible  combinations  is  a  potential  treatment,  but  the 
joint  distribution  of  (A1,  A2,  A3)  may  only  be  considered  for  some  of  them  (e.g.,  treatments  that 
have  not  been  used  in  the  experiment  or  are  physically  impossible  are  not  allowable).  This  is  an 
important  consideration  if  one  wishes  to  conveniently  deal  with  the  canonical  diagrams  of  selective 
influences  only.  In  our  example,  a1,  a2,  a3  have  8,2,  and  8  levels,  respectively,  but  the  number  of 
allowable  treatments  cannot  exceed  16  <  8  x  2  x  8. 

The  general  theory  and  the  pseudo-quasi-metric  tests  discussed  in  Section  2.2  have  been  devel¬ 
oped  for  arbitrary  sets  of  factors  and  outputs,  but  to  keep  notation  simple  this  report  is  confined  to 
finite  sets  only.  Also,  for  simplicity  only,  the  random  outputs  are  assumed  to  be  random  variables 
in  the  narrow  sense  (corresponding  to  the  conventional  Lebesgue  measure  or  countable  measure); 
they  may  be  arbitrary  in  the  general  theory. 

Let  A,  be  the  set  of  possible  levels  of  factor  a1  (i  =  1, . . . ,  n)  in  a  canonical  diagram  of  selective 
influences 


a1  ...  a 1  ...  an 


A1  ...  Ai  ...  An 

Let  $  C  Ai  x  . . .  x  A„  be  the  set  of  allowable  treatments,  and  let  for  every  treatment  <fi  £  $  the 
joint  distribution  of  (A1, . . .  ,An)  be  given.  The  first  question  is: 

If  at  least  for  some  of  the  treatments  (f>  the  random  outputs  are  not  stochastically 
independent,  what  is  the  meaning  of  saying  that  A1  is  selectively  (exclusively)  influenced 
by  a1,  A2  by  a2,  etc.? 

And  assuming  that  a  reasonable  definition  of  selective  influences  is  achieved  (which  means  a  defi¬ 
nition  satisfying  certain  desiderata,  listed  below,  and  lending  itself  to  fruitful  mathematical  devel¬ 
opment),  the  second  question  is: 

How  can  one  determine,  based  on  the  joint  distributions  of  ^ A . . . ,  Afj  for  each  (f>  £  <h, 
whether  the  canonical  diagram  of  selective  influences  is  satisfied? 

The  reasonable  definition  in  question  can  be  given  in  one  of  two  equivalent  forms: 

(Sir)  there  are  independent  random  entities  C,  S1 , . . . ,  Sn  and  functions  Rl  (a*,  C,  Sl )  (i  =  1, . . . ,  n) 
such  that 

(Ri  (ju^s1),...,^  0„,C',S1))  ~  (A3,,..., a;) 

for  any  <f>  =  (ji,  ■  ■  ■ ,  jn)  G  $  (~  meaning  ‘identically  distributed”); 


(Sh)  there  is  a  random  entity  C  and  functions  Rl 2 3 4  (a1,  C)  (i  =  1, . . . ,  n)  such  that 

( R 1  (jn, c))  ~ 

for  any  </>=(j1,..  .,jn)  G 

In  quantum  physics  (see  Section  2.4)  these  formulations  correspond  to  the  classical  explanations  of 
the  entanglement  phenomena  with,  respectively,  stochastic  and  deterministic  hidden  variables. 

The  fact  that  either  of  these  definitions  (SI i  or  Sh)  is  satisfied  is  schematically  indicated  as 
(J1,.--,  J”)  <-P  (a1, ,  an). 


2  Progress 


2.1  Joint  Distribution  Criterion 

Definitions  Sh  and  Sh  were  shown  to  be  equivalent  to  the  following  proposition,  called  the  Joint 
Distribution  Criterion: 


(JDC)  there  is  a  jointly  distributed  vector  of  random  variables 


JT  _  (  j  T 1  ttI  Tji  rri  Tjn  ttii  \ 

U  —  [Hi, Hki, tiki, H1  nkn) 


such  that 


for  any  0  =  (ji,  -  -  .,jn)  G  $• 


H  is  called  the  JDC-vector.  This  criterion  has  a  greater  heuristic  power  than  definitions  Sh  and 
Sh-  Some  of  the  immediate  consequences  of  JDC  are  as  follows: 

1.  for  any  subset  { i i , . . . ,  ik}  of  {1, . . . ,  n},  ( A11 , . . . ,  Alk )  does  not  depend  on  factors  outside 
(cd1, . . . ,  alk )  ( complete  marginal  selectivity ); 

2.  for  any  subset  {ii,.  ■■  ,ik}  of  {1, . . . , n}  we  have  ( A11 , . . . ,  Alk )  <-P  (a*1 , . . . , cdfc)  ( nestedness ); 

3.  for  any  measurable  functions  F\  (a1,  a1)  ,Fn  ( an ,  an)  we  have 

(Fi  (a1,  A1)  ,Fn  (a”,  An))  ^  ( a\...,an ) 


(■ invariance  with  respect  to  factor-level- specific  transformations  of  random  outputs); 

4.  if  (A1, . . .  ,An)  are  random  variables  in  the  narrow  sense,  then  C  in  Sh  or  C,  S1 , . . . ,  Sn 
in  Sh  can  always  be  chosen  to  be  random  variables  in  the  narrow  sense.  Moreover,  they 
can  be  chosen  arbitrarily  as  any  continuously  (atomlessly)  distributed  random  variables,  e.g., 
uniformly  distributed  between  0  and  1. 
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2.2  Distance-like  functions 

Let  X  =  {Xu  :  oj  £  17 }  be  an  indexed  set  of  jointly  distributed  random  variables  X u  with  dis¬ 
tributions  (14,,  Hu)-  For  any  a,  (3  £  II,  the  ordered  pair  ( Xa,Xp )  is  a  random  variable  with 
distribution  (Va  x  Vp,  Ea  x  T,p,  p,atp),  and  X  x  X  is  a  set  of  jointly  distributed  random  variables 
(hence  also  a  random  variable). 

We  call  a  function  d:Ix!->Ra  pseudo-quasi-metric  {p.q. -metric)  on  X  if,  for  all  a,  f3, 7  €  f2, 

(i)  d(Xa,Xp)  only  depends  on  the  joint  distribution  of  (. Xa,Xp ), 

(ii)  d{Xa,Xp)>0, 

(hi)  d(Xa,Xa)  =  0, 

(iv)  d  (Xa,  XJ  <  d  (Xa,Xp)  +  d  ( Xp ,  X7). 

Conventional  pseudometrics  (also  called  semimetrics)  obtain  by  adding  the  property  d  (Xa,Xp)  = 
d  ( Xp ,  Xa)\  conventional  quasimetrics  are  obtained  by  adding  the  property  a  7^  /3  =>•  d  (Xa,  Xp)  > 
0.  A  conventional  metric  is  both  a  pseudometric  and  a  quasimetric. 

The  relevance  of  the  p.q. -metrics  on  the  sets  of  jointly  distributed  random  variables  to  the 
problem  of  selectivity  lies  in  the  general  test  (necessary  condition)  for  selectivity  of  influences, 
formulated  after  the  following  definition. 

We  call  a  sequence  of  input  points  (jj , . . .  ,ji)  (where  j,  £  A,  for  i  =  1 , ,1  >  3)  treatment- 
realizable  if  there  are  treatments  (j)1, ...  ,4>l  £  $  (not  necessarily  pairwise  distinct),  such  that 

(ji,ji)  C  (j)1  and  {ji-i,ji)  C  4>l  for  i  =  2, . . . ,  l. 

Now,  if  a  JDC-vector  H  exists,  then  for  any  p.q. -metric  d  on  H  we  should  have 

and 

for  i  =  2, . . .  ,1  whence 

d(A\1,A\1)<YJd(Ai^\Ad'^.  (1) 

i— 2 

This  chain  inequality,  written  entirely  in  terms  of  observable  probabilities,  is  referred  to  as  a,  p.q. - 
metric  test  for  selectivity  of  influences.  If  this  inequality  is  violated  for  at  least  one  treatment- 
realizable  sequence  of  input  points,  no  JDC-vector  H  exists,  and  the  selectivity  is  ruled  out. 

It  turns  out  that  one  only  needs  to  check  the  chain  inequality  for  a  special  subset  of  all  possible 
treatment-realizable  sequences  ji,  . . .  ,ji-  Namely,  a  treatment-realizable  sequence  j  1, . . .  ,ji  is  called 
irreducible  if  j-\  7^  ji  and  the  only  subsequences  { jq , . .  . ,  jik }  with  k  >  1  that  are  subsets  of 
treatments  are  pairs  {ji,ji}  and  for  i  =  2, . . .  ,1.  Otherwise  the  sequence  is  reducible.  It 

was  proved  that 

given  a  p.q. -metric  d  on  the  hypothetical  JDC-vector  H,  inequality  (1)  is  satisfied  for 
all  treatment-realizable  sequences  if  and  only  if  this  inequality  holds  for  all  irreducible 
sequences. 

As  a  special  case, 
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if  $  =  Ai  x  ...  x  An  (a  completely  crossed  factorial  design),  then  inequality  (1)  is 
satisfied  for  all  treatment-realizable  sequences  if  and  only  if  this  inequality  holds  for  all 
tetradic  sequences  of  the  form  x,y,s,t,  with  x,  s  £  {a}  x  A*,  y,t  £  Ay,  i  /  s,  j  /  f, 

*7 Aj- 

A  very  versatile  and  useful  class  of  p.q. -metrics  is  formed  by  order- distances.  Given  an  indexed  set 
of  jointly  distributed  random  variables  X  =  {Xu  :  oj  £  11},  let 

R  c  |J  VaxVp, 

(a,0  )gf!xf! 

where  K,  denotes  the  set  of  possible  values  of  Xu.  We  write  a  +  b  to  designate  (a,  b)  £  R. 
Let  R  be  a  total  order,  that  is,  transitive,  reflexive,  and  connected  in  the  sense  that  for  any 
(a,  b)  £  U(a/3)gnxn^“  x  at  least  one  of  the  relations  a  <b  and  b  +  a  holds.  We  define  the 
equivalence  a  ~  b  and  strict  order  a  -<  b  induced  by  +  in  the  usual  way.  Finally,  we  assume  that 
for  any  (a,  (3)  £  Q  x  12,  the  sets 


{(a,  b)  :  a  £  Va,  b  £  Vp,  a  +  6} 


are  pa,/3-measurable,  where  fia,p  is  the  probability  measure  for  (Xa,  Xp).  This  implies  the  /ra,/3-measurability 
of  the  sets 

{(a,  b)  :  a  £  Va,b  £  Vp,  a  -<  b}  ,  {(a,  b)  :  a  £  Va,b  £  Vp,  a  ~  b}  . 

The  function 

B(Xa,Xp)  =  Pv[Xa^Xp] 

is  called  an  order  p.q. -metric,  or  order-distance,  on  X.  It  was  proved  that  D  satisfies  the  properties 
(i)-(iv)  of  the  definition  of  a  p.q.-metric. 

As  an  example  of  an  order-distance  applied  to  the  selectivity  problem,  let  Ai  =  [0, 1],  A2  =  [0, 1], 

$  =  Ai  x  A2 .  Let  A^,  A ^  for  any  treatment  <f>  =  (w-'i ,  u>2)  have  a  bivariate  normal  distribution  with 
zero  means,  unit  variances,  and  correlation  p  =  min  ( 1 ,  w-\  +ru2)-  Marginal  selectivity  is  trivially 
satisfied.  For  any  bivariate  normally  distributed  ( A ,  B),  let  us  define  A  -<  B  iff  A  <  0,  B  >  0.  Then 
the  corresponding  order-distance  on  the  hypothetical  JDC-set  H  is 


D  (K, 


arccos  (min  {l,w\  +  w2)) 
27T 


The  sequence  of  input  points  (1,  0) ,  (2, 1) ,  (1, 1) ,  (2,  0)  is  treatment-realizable,  so  if  H  exists,  we 
should  have 

D  (Hi  Hi)  <  D  (Hi,  Hi)  +  D  (hI,H\)  +  D  (H\,HI)  . 

The  numerical  substitutions  yield,  however, 


<0  +  0  +  0, 


and  as  this  is  false,  the  hypothesis  of  selectively  is  rejected. 

This  example  generalizes  into  a  special  class  of  order-distances,  classification  distances,  defined 
by  the  following  construction  of  +:  provided  the  sigma-algebra  associated  with  each  Vu  contains 
at  least  n  >  1  disjoint  nonempty  sets,  one  can  partition  each  Vu  as  !  vH\  with  Vu^  £  Ew,  and 
put  a  +  b  if  and  only  if  a  £  Vak\  b  £  Vp  '1  and  k  <1.  Another  application  of  classification  distances 
will  be  given  in  Section  2.4. 
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2.3  Linear  Feasibility  Test 

Let  now  each  random  variable  A1  have  a  finite  set  of  mi  possible  values  (enumerated  for  simplicity 
1, . . . ,  rrij),  and  let  each  factor/input  a1  contain  fc,  factor  levels  (enumerated  1, . . . ,  kf).  This  is  ar¬ 
guably  the  most  important  special  case  both  because  it  is  ubiquitous  and  because  in  all  other  cases 
random  variables  and  factors  can  be  discretized  into  finite  number  of  categories.  The  Linear  Feasibil¬ 
ity  Test  (LFT)  to  be  described  is  a  direct  application  of  JDC  to  this  situation,  furnishing  a  necessary 
and  sufficient  condition  for  the  diagram  of  selective  influences  (A1, . . . ,  A")  <-P  (a1, . . . ,  an). 

The  distributions  of  ^Al , . . .  ,A^j  are  represented  by  probabilities 

Pr  \A\  =  a1,...,A7fj>  =  an\  , 

for  all  cf)  =  (ji, . . .  ,jn)  G  $  and  all  (ai, . . . ,  an)  £  {1, . . . ,  mi}  x  ...  x  {1, . . . ,  mn}  .  We  consider 
this  probability  the  [(ai, . . . ,  an)  ,(j i, . . . ,  j„)]th  component  of  the  mi . . .  mnt- vector  P  of  all  such 
probabilities  (with  t  denoting  the  number  of  treatments  in  <f>).  The  joint  distribution  of  H  in  JDC, 
if  it  exists,  is  represented  by  probabilities 


with 


Pr  [Hi 


=  h\...,Hli=  h}kl,.  ..,H?  =  h^,  ...,H%ri=  hi  J  , 

x  ...  x  {!,..., mn}k 


We  consider  this  probability  the  [h\, . . . ,  h\  , . . . ,  /i" , . . . ,  hk  )th  component  of  the  (mi)fcl  . . .  ( m.n)kn - 
vector  Q  of  all  such  hypothetical  probabilities. 

Consider  now  the  Boolean  matrix  AI  with  rows  corresponding  to  components  of  P  and  columns 
to  components  of  Q:  let  M  (r,  c)  =  1  if  and  only  if  row  r  corresponds  to  the  [(jT,  •  •  •  ,jn ) ,  (ai,  •  •  • ,  a„)]th 
component  of  P,  column  c  to  the  (h\, . . . ,  hk  , . . . ,  h ™ , . . . ,  hk  ) th  component  of  Q,  and 


h)±  =  a1?*.  • ,  hjn  =  a, 


Clearly,  the  vector  Q  exists  if  and  only  if  the  system 


MQ  =  P,  Q  >  0  (2) 

has  a  solution  (is  feasible).  This  is  a  linear  programming  task  in  the  standard  form  (with  a  constant 
objective  function).  Let  C(P)  be  a  Boolean  function  equal  to  1  if  and  only  if  this  system  is 
feasible.  C  ( P )  is  known  in  linear  programming  to  always  be  computable,  its  time  complexity  being 
polynomial.  It  is  therefore  justifiable  to  call  JDC  a  general  solution  for  the  problem  of  rejecting  or 
confirming  a  diagram  of  selective  influences  in  all  cases  involving  only  finite  sets  of  values/levels. 

The  linear  system  (2)  is  feasible  if  and  only  if  the  point  P  belongs  to  the  convex  hull  of  the 
points  corresponding  to  the  columns  of  M,  which  form  a  subset  of  the  vertices  of  a  unit  hypercube. 
In  particular,  if  the  set  <E>  of  allowable  treatments  contains  all  combinations  of  factors  points,  the 
polytope  is  the  ((ki  (mi  —  1)  +  1) . . .  (kn  ( mn  —  1)  +  1)  —  1) -dimensional  convex  hull  of  the  points 
corresponding  to  the  columns  of  the  Boolean  matrix  AI,  which  form  a  subset  of  the  vertices  of  the 
(mi)fel  . . .  ( m„)fe"  -dimensional  unit  hypercube. 

As  an  example,  let  there  be  factors  a  =  {l,2},/3  =  {l,  2},  and  let  the  set  of  allowable  treatments 
$  consist  of  all  four  possible  combinations  of  the  factor  points.  Let  A  and  B  be  binary  variables, 
ai  =  b\  =  1,  02  =  62  =  2,  distributed  as  shown: 
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a  (3 

A  B 

Pr 

a  j3 

A  B 

Pr 

1  1 

1  1 

.140 

1  2 

1  1 

.198 

1  2 

.360 

1  2 

.302 

2  1 

.360 

2  1 

.302 

2  2 

.140 

2  2 

.198 

a  / 3 

A  B 

Pr 

a  (3 

A  B 

Pr 

2  1 

1  1 

.189 

2  2 

1  1 

.460 

1  2 

.311 

1  2 

.040 

2  1 

.311 

2  1 

.040 

2  2 

.189 

2  2 

.460 

Marginal  selectivity  here  is  satisfied  trivially:  all  marginal  probabilities  are  equal  0.5,  for  all  treat¬ 
ments.  In  the  matrix  form  of  the  LFT,  the  column-vector  of  the  above  16  probabilities, 

(.140,  .360,  .360, . . . ,  .040,  .040,  .460)T, 

using  T  for  transposition,  is  denoted  by  P.  The  LFT  problem  is  defined  by  the  system  MQ=P, 
Q  >  0,  where  the  16  x  16  Boolean  matrix  M  is  shown  below:  each  column  of  the  matrix  corresponds 
to  a  combination  of  values  for  the  hypothetical  H -variables  (shown  above  the  matrix),  while  each 
row  corresponds  to  a  combination  of  a  treatment  with  values  of  the  outputs  A,  B  (shown  on  the 
left). 


Hy, 

1 

1 
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1 
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1 
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2 

2 

2 

2 
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1 

1 

1 

1 

2 

2 

2 

2 

1 

1 

1 

1 

2 

2 

2 

2 

Hts 

1 

1 

2 

2 

1 

1 

2 

2 

1 

1 

2 

2 

1 

1 

2 

2 
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1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

a 

P 
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B 

i 
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The  linear  programing  routine  of  Mathematica™  (using  the  interior  point  algorithm)  shows  that 
here  the  linear  system  (2)  has  solutions  corresponding  to  the  J DC- vector 
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By, 

By, 

By3 

b2 „ 

Pr 

1 

1 

1 

1 

.02708610 

1 

1 

1 

2 

.00239295 

1 

1 

2 

1 

.16689300 

1 

1 

2 

2 

.03358610 

1 

2 

1 

1 

.00197965 

1 

2 

1 

2 

.10854100 

1 

2 

2 

1 

.00204128 

1 

2 

2 

2 

.15748000 

By, 

By, 

B10 

B2ff 

Pr 

2 

1 

1 

1 

.15748000 

2 

1 

1 

2 

.00204128 

2 

1 

2 

1 

.10854100 

2 

1 

2 

2 

.00197965 

2 

2 

1 

1 

.03358610 

2 

2 

1 

2 

.16689300 

2 

2 

2 

1 

.00239295 

2 

2 

2 

2 

.02708610 

The  column-vector  of  these  probabilities  constitutes  Q  >  0.  This  proves  that  in  this  case  we  do 
have  ( A,B )  <-P 

Let  us  now  change  the  distributions  of  (A,  B )  to  the  following: 


a  /3 

A  B 

Pr 

a  [3 

A  B 

Pr 

1  1 

1  1 

.450 

1  2 

1  1 

.105 

1  2 

.050 

1  2 

.395 

2  1 

.050 

2  1 

.395 

2  2 

.450 

2  2 

.105 

a  j3 

A  B 

Pr 

a  /3 

A  B 

Pr 

2  1 

1  1 

.170 

2  2 

1  1 

.110 

1  2 

.330 

1  2 

.390 

2  1 

.330 

2  1 

.390 

2  2 

.170 

2  2 

.110 

Once  again,  marginal  selectivity  is  satisfied  trivially,  as  all  marginal  probabilities  are  0.5,  for  all 
treatments.  The  linear  programing  routine  of  Mathematica™,  however,  shows  that  the  linear 
system  (2)  has  no  solutions  here.  This  excludes  the  existence  of  a  J  DC- vector  for  this  situations, 
ruling  out  thereby  the  possibility  of  (A,  B )  <-P  (a,  j3). 

2.4  Paralells  with  Quantum  Physics 

Both  the  Linear  Feasibility  Test  and  the  Joint  Distribution  Criterion  on  which  it  is  based  have 
their  analogues  in  quantum  physics.  To  appreciate  the  analogy,  however,  one  has  to  adopt  the 
interpretation  of  noncommuting  quantum  measurements  performed  on  a  given  component  of  a 
quantum-entangled  system  as  mutually  exclusive  factor  levels  of  the  same  factor. 

In  the  Einstein-Podolsky-Rosen  (EPR)  paradigm,  several  subatomic  particles  are  emitted  from 
a  common  source  in  such  a  way  that  they  remain  entangled  (have  highly  correlated  properties, 
such  as  momenta  or  spins)  as  they  run  away  from  each  other.  An  experiment  may  consist,  e.g.,  in 
measuring  the  spin  of  electron  1  along  one  of  several  axes,  a1  =  a\,  a1  =  a2,  etc.,  and  (in  another 
location  but  simultaneously  in  some  frame  of  reference)  measuring  the  spin  of  electron  2  along  one  of 
several  axes,  a 2  =  af,  a2  =  a2  ,  etc.,  and  in  the  same  manner  for  other  particles.  The  outcome  A 1 
of  a  measurement  along  any  axis  on  particle  i  =  1, . . . ,  n  is  a  random  variable  with  several  possible 
values,  depending  on  the  spin  number  of  the  particles  (for  electrons,  there  are  two  possible  values, 
“up”  or  “down”).  The  question  that  arises  is:  does  measurement  a1  selectively  affect  only  A 1  (even 
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though  A1, . . . ,  An  are  not  independent)?  If  the  answer  is  negative,  then  the  measurement  of  one 
electron  affects  the  outcome  of  the  measurement  of  another  electron  even  though  no  information 
can  be  exchanged  between  two  distant  events  that  are  simultaneous  in  some  frame  of  reference. 
What  makes  this  situation  formally  identical  to  the  selective  influence  problems  considered  above 
is  that  the  measurements  along  two  different  axes,  say,  a\  and  a\,  are  non- commuting,  i.e.,  they 
cannot  be  performed  on  the  itli  particle  simultaneously.  This  makes  it  possible  to  consider  them 
as  levels  of  factor  a1 . 

Below  is  the  table  of  correspondences  between  the  general  language  of  selective  probabilsitic 
causality  and  the  quantum-mechanical  notions  used  in  the  analysis  of  spins  of  entangled  particles: 


Selective  Probabilistic  Causality  (general) 

Quantum  Entanglement  Problem  (for  spins) 

observed  random  output 

detected  spin  value  of  a  given  particle 

factor  /  input 

spin  measurement  in  a  given  particle 

factor  level 

setting  (axis)  of  the  spin  measurement 

joint  distribution  criterion 

joint  distribution  criterion 

canonical  diagram  of  selective  influences 

“classical”  explanation  (by  context-independent  local  variables) 

representation  in  the  form  SIi 

probabilistic  “classical”  explanation 

representation  in  the  form  SI2 

deterministic  “classical”  explanation 

The  results  of  the  simplest  entanglement  experiment,  with  n  —  2,Aq  =  k2  =  2,  m.\  =  m 2  =  2, 
are  described  by  the  estimates  of  16  probabilities 


p(A\A 


2'a\a2) 


=  Pr 


A1  = 


up 

down 


,A2  = 


up 

down 


a{  2 

i  ,  a  = 


al 


with  nothing  preventing  one,  of  course,  from  encoding  both  a\  and  a2  by  1  and  a\  and  a2  by  2. 
Encoding  “down”  and  “up”  spins  for  A  by  •  and  o,  and  for  B  by  U  and  n,  we  get 


^  =  (1,1) 

B 11  =U 

B 11  =n 

An  =  • 

Pn 

P12 

ai. 

An  =  0 

P21 

P22 

a% 

b,  1 

b.2 

0  =  (2, 1) 

B  21  =  LI 

B2 1  =  n 

A  21  =  • 

rn 

ri2 

A  21  =  0 

r  21 

?’22 

o' 2. 

b.i 

^•2 

0  =  (1,2) 

B12  =  U 

B 12  =  n 

• 

II 

(N 

qn 

qi2 

ai. 

O 

II 

921 

q22 

a% 

V.2 

0  =  (2,2) 

b22  =  u 

b22  =  n 

A2  2  =  • 

511 

S12 

a[. 

A22  =  0 

S21 

S22 

a'2. 

V.1 

*2 

It  is  known  since  Arthur  Fine’s  work  (J.  Math.  Phys.  23,  1306-1310,  1982)  that  the  existence 
of  the  JDC-vector  for  this  situation  (interpreted  as  the  existence  of  a  classical  explanation  for  it)  is 
equivalent  to  the  probabilities  satisfying  the  following  inequalities: 


~  1  <  P 11  +  J*n  +  Sn  —  <711  —  a\.  —  b.  1  <  0, 
-1  <  qn  +  sn  +  rn  -  pn  -  a[.  -  b[x  <  0, 
-1  <  rlx  +  pn  +  qn  -  S11  -  ax.  -  b.  1  <  0, 
-1  <  sn  +  qn  +  P11  -  rn  -  ax.  -  b\  <  0. 
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By  applying  the  LFT  to  the  matrices  above,  these  inequalities  are  shown  to  be  solutions  for  (2), 
with 


P  —  (Pll)Pl2>  •  •  •  i  S21)  '-’22 )T 

and  M  the  same  16  x  16  Boolean  matrix  as  in  Section  2.3.  In  fact,  using  a  standard  facet  enumeration 
program  (e.g.,  lrs  program  at  http:/ /cgm.cs.mcgill.ca/~avis/C/lrs.html)  these  inequalities  (together 
with  the  equalities  representing  marginal  selectivity)  can  be  derived  from  (2)  “mechanically.” 

The  same  mechanical  derivation  can  be  used  for  larger  entanglement  problems.  Once  such  a 
system  of  inequalities  S  is  derived,  one  can  use  it  to  prove  necessity  (or  sufficiency)  of  any  other 
system  S'  by  showing,  with  the  aid  of  a  linear  programming  algorithm,  that  S1  is  redundant 
when  added  to  S  (respectively,  S  is  redundant  when  added  to  S').  But  given  a  set  of  numerical 
(experimentally  estimated  or  theoretical)  probabilities,  computing  C  ( P )  is  always  preferable  to 
dealing  with  explicit  inequalities  as  their  number  becomes  very  large  even  for  moderate-size  vectors 
P.  While  the  set  of  inequalities  (for  n  =  2,  k±  =  fc2  =  2,  m\  =  m2  =  2),  assuming  that  the  marginal 
selectivity  equalities  hold,  number  just  8,  already  for  n  =  2,  k\  =  k2  =  2  with  m\  =  m2  =  3 
(describing,  e.g.,  an  EPR  experiment  with  two  spin-1  particles,  or  two  spin-y2  ones  and  inefficient 
detectors),  our  computations  yield  1080  inequalities  equivalent  to  C(P)  =  1,  and  for  n  =  3, 
k\  =  k2  =  /c,3  =  2  and  m\  =  m2  =  m3  =  2,  corresponding  to  the  Greenberger-Horne-Zeilinger 
paradigm  with  three  spin-y2  particles,  this  number  is  53792. 

The  inequalities  in  (3)  can  also  be  derived  using  the  classification  distances  discussed  in  Section 
2.2.  Consider  the  chain  inequalities  for  the  order-distance  Di  obtained  by  putting  •  =  □  =  !, 
o  =  n  =  2,  and  identifying  ■<  with  <: 

qi2  =  Di (HlH2y)  <  Vl{HlHl)+Vl{Hl1Hl)+Vl{Hl,1Hl)=p12+r2i+s12, 

P12  =  Di =  qi2+s2i+ri2 , 

s12  =  =  r12+p21+gi2, 

r12  =  D r{Hl„H2v)  <  =  Sl2+q2i+Pi2- 

Consider  also  the  inequalities  for  the  order-distance  D2  obtained  by  putting  *  =  n  =  l,  o  =  U  =  2, 
and  identifying  ^  with  <: 

9n  =  D2(i7/,i7^/)  <  D2(i7/,77^)+D2(i7^,i7/,)+D2(i7/,,i7^,)  =  Pn+r22-(-Sii, 

Pn  =  D2 {HlH2y)  <  D2 (H* ,Hy,)+T>2 +D2 (H^.,,Hy)  =  qu+sn+ru, 

811  =  D 2{Hl„H2y)  <  D2(Hl,,H*)+D2(HlH1x)+n2(H1x,H*,)  =  r11+p22+?11, 
rn  =  D2{Hx,,Hy)  <  D2(H1x„H2y,)+D2(H2y,1H1x)+B2(H1x,H^  =  Su+q^+pn- 
It  was  shown  that 

Each  right-hand  inequality  in  (3)  is  equivalent  to  the  corresponding  chain  inequality 
in  (4)  for  the  order-distance  Di.  Each  left-hand  inequality  in  (3)  is  equivalent  to  the 
corresponding  chain  inequality  in  (5)  for  the  order-distance  D2. 

2.5  Sample-level  tests 

The  set  of  vectors  P  for  which  the  system  (2)  has  a  solution  forms  a  convex  polytope.  Recently 
Clintin  Davis-Stober  (J.  Math.  Psych.,  53,  1-13,  2009)  developed  a  statistical  theory  for  testing 
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the  hypothesis  that  a  vector  of  probabilities  P  (not  necessarily  of  the  same  structure  as  in  LFT) 
belongs  to  a  convex  polytope  V  against  the  hypothesis  that  it  does  not.  Under  certain  regularity 
constraints  he  derived  the  asymptotic  distribution  (a  convex  mixture  of  chi-square  distributions) 
for  the  log  maximum  likelihood  ratio  statistic 

01  maxpe-p  L(P\N) 

°S  maxp  L  (P\N)  ’ 

where  N  is  the  vector  of  observed  absolute  frequencies,  comprised  of  the  numbers  of  occurrences 
of  (li,  ■  ■ .  ,ln',ji,  ■  ■  ■  ,jn)  dr  the  case  of  LFT.  The  likelihoods  L(P\N)  were  computed  using  the 
standard  theory  of  multinomial  distributions.  Due  to  this  development  the  statistical  aims  of  this 
part  of  the  project  were  deemphasized. 

Other  approaches  readily  suggest  themselves.  One  of  them  is  to  use  the  known  theory  of 
L(P\N)  /  nraxp  L  (P\N)  to  compute  a  confidence  region  of  possible  probability  vectors  P  for  a 
given  empirical  vector  N.  The  hypothesis  of  selective  influences  is  retained  or  rejected  according 
as  this  confidence  region  contains  or  does  not  contain  a  point  P  that  passes  LFT.  Resampling 
techniques  is  another  obvious  approach,  e.g.,  the  permutation  test  in  which  the  assignment  of 
empirical  distributions  to  different  treatments  is  randomly  “reshuffled”  so  that  each  distribution 
generally  ends  up  assigned  to  a  “wrong”  treatment.  If  the  proportion  of  the  permuted  assignments 
whose  deviation  from  the  LFT  polytope  does  not  exceed  that  of  the  the  observed  estimate  of  P  is 
sufficiently  small,  the  hypothesis  of  selective  influences  can  be  considered  supported. 

3  Conclusion 

Within  the  framework  of  this  part  of  the  project, 

•  a  general  mathematical  theory  of  selective  influences  was  elaborated  (which  input  influences 
which  of  probabilistically  interdependent  random  outputs); 

•  the  Joint  Distribution  Criterion  was  formulated  in  complete  generality; 

•  a  theory  of  pseudo-quasi-metrics  was  constructed  to  be  used  to  test  for  selectiveness  of  influ¬ 
ences; 

•  a  Linear  Feasibility  Test  for  selective  influences  with  finite-valued  random  outputs  was  con¬ 
structed; 

•  a  formal  equivalence  of  selective  influences  with  the  issue  of  quantum  entanglement  in  physics 
was  established,  with  non-commuting  measurements  in  quantum  physics  paralleling  the  mu¬ 
tually  exclusive  values  of  inputs  (external  factors)  in  behavioral  sciences. 
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